Agent Modelling in Partially Observable Domains
نویسندگان
چکیده
Monitoring selectivity is a key challenge faced by agents when modelling other agents(1) — agents cannot continually monitor others due to the computational burden of such monitoring and modelling, but lack of such monitoring and modelling leads to increased uncertainty about the state of other agents. Such monitoring selectivity is also crucially important when agents engage in planning in the presence of action and observation uncertainty. Formally, this paper focuses on an agent that uses a POMDP to plan its activities, in a multiagent setting, and illustrates the critical nature of the monitoring selectivity challenge in POMDPs. The paper presents heuristics to limit the amount of monitoring and modelling of other agents, where the heuristics exploit the reward structure and transition probabilities to automatically determine where to curtail such monitoring and modelling. We concretely illustrate our techniques in the domain of software personal assistants, and present some initial experimental results illustrating the efficiency
منابع مشابه
Learning Policies with External Memory
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in which the agent’s actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent. In this case, we need to learn a r...
متن کاملRobot Navigation in Partially Observable Domains using Hierarchical Memory-Based Reinforcement Learning
In this paper, we attempt to find a solution to the problem of robot navigation in a domain with partial observability. The domain is a grid-world with intersecting corridors, where the agent learns an optimal policy for navigation by making use of a hierarchical memory-based learning algorithm. We define a hierarchy of levels over which the agent abstracts the learning process, as well as it...
متن کاملRisk-sensitive planning in partially observable environments
Partially Observable Markov Decision Process (POMDP) is a popular framework for planning under uncertainty in partially observable domains. Yet, the POMDP model is riskneutral in that it assumes that the agent is maximizing the expected reward of its actions. In contrast, in domains like financial planning, it is often required that the agent decisions are risk-sensitive (maximize the utility o...
متن کاملRisk-Sensitive Planning in Partially Observable Environments
Partially Observable Markov Decision Process (POMDP) is a popular framework for planning under uncertainty in partially observable domains. Yet, the POMDP model is riskneutral in that it assumes that the agent is maximizing the expected reward of its actions. In contrast, in domains like financial planning, it is often required that the agent decisions are risk-sensitive (maximize the utility o...
متن کاملImproving Uncoordinated Collaboration in Partially Observable Domains with Imperfect Simultaneous Action Communication
Decentralised planning in partially observable multi-agent domains is limited by the interacting agents’ incomplete knowledge of their peers, which impacts their ability to work jointly towards a common goal. In this context, communication is often used as a means of observation exchange, which helps each agent in reducing uncertainty and acquiring a more centralised view of the world. However,...
متن کامل